Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 777715 |
| Missing cells | 240048 |
| Missing cells (%) | 1.5% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 140.7 MiB |
| Average record size in memory | 189.7 B |
Variable types
| Numeric | 7 |
|---|---|
| Categorical | 11 |
| Boolean | 2 |
FLAG_MOBIL has constant value "1" | Constant |
CNT_CHILDREN is highly overall correlated with CNT_FAM_MEMBERS | High correlation |
DAYS_EMPLOYED is highly overall correlated with NAME_INCOME_TYPE and 1 other fields | High correlation |
CNT_FAM_MEMBERS is highly overall correlated with CNT_CHILDREN | High correlation |
CODE_GENDER is highly overall correlated with FLAG_OWN_CAR and 1 other fields | High correlation |
NAME_INCOME_TYPE is highly overall correlated with DAYS_BIRTH and 1 other fields | High correlation |
OCCUPATION_TYPE is highly overall correlated with CODE_GENDER | High correlation |
DAYS_BIRTH is highly overall correlated with NAME_INCOME_TYPE and 1 other fields | High correlation |
FLAG_OWN_CAR is highly overall correlated with CODE_GENDER | High correlation |
OCCUPATION_TYPE has 240048 (30.9%) missing values | Missing |
CNT_CHILDREN has 540639 (69.5%) zeros | Zeros |
MONTHS_BALANCE has 24672 (3.2%) zeros | Zeros |
Reproduction
| Analysis started | 2023-03-05 22:51:23.499218 |
|---|---|
| Analysis finished | 2023-03-05 22:53:23.005522 |
| Duration | 1 minute and 59.51 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
ID
Real number (ℝ)
| Distinct | 36457 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5078742.9 |
| Minimum | 5008804 |
|---|---|
| Maximum | 5150487 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | 5008804 |
|---|---|
| 5-th percentile | 5018481 |
| Q1 | 5044568.5 |
| median | 5069530 |
| Q3 | 5115551 |
| 95-th percentile | 5146052 |
| Maximum | 5150487 |
| Range | 141683 |
| Interquartile range (IQR) | 70982.5 |
Descriptive statistics
| Standard deviation | 41804.425 |
|---|---|
| Coefficient of variation (CV) | 0.0082312543 |
| Kurtosis | -1.2068975 |
| Mean | 5078742.9 |
| Median Absolute Deviation (MAD) | 35960 |
| Skewness | 0.073626481 |
| Sum | 3.9498146 × 1012 |
| Variance | 1.7476099 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5090630 | 61 | < 0.1% |
| 5148524 | 61 | < 0.1% |
| 5066707 | 61 | < 0.1% |
| 5061848 | 61 | < 0.1% |
| 5118380 | 61 | < 0.1% |
| 5112636 | 61 | < 0.1% |
| 5009106 | 61 | < 0.1% |
| 5099880 | 61 | < 0.1% |
| 5085886 | 61 | < 0.1% |
| 5045838 | 61 | < 0.1% |
| Other values (36447) | 777105 |
| Value | Count | Frequency (%) |
| 5008804 | 16 | |
| 5008805 | 15 | < 0.1% |
| 5008806 | 30 | |
| 5008808 | 5 | < 0.1% |
| 5008809 | 5 | < 0.1% |
| 5008810 | 27 | |
| 5008811 | 39 | |
| 5008812 | 17 | |
| 5008813 | 17 | |
| 5008814 | 17 |
| Value | Count | Frequency (%) |
| 5150487 | 30 | |
| 5150485 | 2 | < 0.1% |
| 5150484 | 13 | < 0.1% |
| 5150483 | 18 | |
| 5150482 | 18 | |
| 5150481 | 43 | |
| 5150480 | 26 | |
| 5150479 | 9 | < 0.1% |
| 5150478 | 14 | < 0.1% |
| 5150477 | 21 |
CODE_GENDER
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| F | |
|---|---|
| M |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | M |
|---|---|
| 2nd row | M |
| 3rd row | M |
| 4th row | M |
| 5th row | M |
Common Values
| Value | Count | Frequency (%) |
| F | 518851 | |
| M | 258864 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| f | 518851 | |
| m | 258864 |
Most occurring characters
| Value | Count | Frequency (%) |
| F | 518851 | |
| M | 258864 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 777715 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 518851 | |
| M | 258864 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 777715 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| F | 518851 | |
| M | 258864 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| F | 518851 | |
| M | 258864 |
FLAG_OWN_CAR
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.7 MiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 473355 | |
| True | 304360 |
FLAG_OWN_REALTY
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.7 MiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 512948 | |
| False | 264767 |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.42808227 |
| Minimum | 0 |
|---|---|
| Maximum | 19 |
| Zeros | 540639 |
| Zeros (%) | 69.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7457552 |
|---|---|
| Coefficient of variation (CV) | 1.7420839 |
| Kurtosis | 21.025355 |
| Mean | 0.42808227 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.5873102 |
| Sum | 332926 |
| Variance | 0.55615083 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 540639 | |
| 1 | 155638 | 20.0% |
| 2 | 70399 | 9.1% |
| 3 | 9328 | 1.2% |
| 4 | 1224 | 0.2% |
| 5 | 324 | < 0.1% |
| 14 | 111 | < 0.1% |
| 7 | 46 | < 0.1% |
| 19 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 540639 | |
| 1 | 155638 | 20.0% |
| 2 | 70399 | 9.1% |
| 3 | 9328 | 1.2% |
| 4 | 1224 | 0.2% |
| 5 | 324 | < 0.1% |
| 7 | 46 | < 0.1% |
| 14 | 111 | < 0.1% |
| 19 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 19 | 6 | < 0.1% |
| 14 | 111 | < 0.1% |
| 7 | 46 | < 0.1% |
| 5 | 324 | < 0.1% |
| 4 | 1224 | 0.2% |
| 3 | 9328 | 1.2% |
| 2 | 70399 | 9.1% |
| 1 | 155638 | 20.0% |
| 0 | 540639 |
AMT_INCOME_TOTAL
Real number (ℝ)
| Distinct | 265 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 188534.8 |
| Minimum | 27000 |
|---|---|
| Maximum | 1575000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | 27000 |
|---|---|
| 5-th percentile | 76500 |
| Q1 | 121500 |
| median | 162000 |
| Q3 | 225000 |
| 95-th percentile | 360000 |
| Maximum | 1575000 |
| Range | 1548000 |
| Interquartile range (IQR) | 103500 |
Descriptive statistics
| Standard deviation | 101622.45 |
|---|---|
| Coefficient of variation (CV) | 0.53901163 |
| Kurtosis | 15.804592 |
| Mean | 188534.8 |
| Median Absolute Deviation (MAD) | 49500 |
| Skewness | 2.5776417 |
| Sum | 1.4662634 × 1011 |
| Variance | 1.0327122 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 135000 | 90217 | 11.6% |
| 180000 | 68579 | 8.8% |
| 157500 | 62686 | 8.1% |
| 112500 | 61622 | 7.9% |
| 225000 | 61399 | 7.9% |
| 202500 | 47707 | 6.1% |
| 270000 | 37222 | 4.8% |
| 90000 | 36337 | 4.7% |
| 315000 | 23136 | 3.0% |
| 67500 | 18822 | 2.4% |
| Other values (255) | 269988 |
| Value | Count | Frequency (%) |
| 27000 | 78 | < 0.1% |
| 29250 | 44 | < 0.1% |
| 30150 | 79 | < 0.1% |
| 31500 | 240 | |
| 31531.5 | 65 | < 0.1% |
| 31950 | 7 | < 0.1% |
| 32400 | 32 | < 0.1% |
| 33300 | 323 | |
| 33750 | 8 | < 0.1% |
| 36000 | 144 |
| Value | Count | Frequency (%) |
| 1575000 | 150 | < 0.1% |
| 1350000 | 102 | < 0.1% |
| 1125000 | 83 | < 0.1% |
| 990000 | 26 | < 0.1% |
| 945000 | 48 | < 0.1% |
| 900000 | 844 | |
| 810000 | 317 | < 0.1% |
| 787500 | 42 | < 0.1% |
| 765000 | 122 | < 0.1% |
| 742500 | 100 | < 0.1% |
NAME_INCOME_TYPE
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| Working | |
|---|---|
| Commercial associate | |
| Pensioner | |
| State servant | |
| Student | 337 |
Length
| Max length | 20 |
|---|---|
| Median length | 7 |
| Mean length | 10.900415 |
| Min length | 7 |
Characters and Unicode
| Total characters | 8477416 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Working |
|---|---|
| 2nd row | Working |
| 3rd row | Working |
| 4th row | Working |
| 5th row | Working |
Common Values
| Value | Count | Frequency (%) |
| Working | 400164 | |
| Commercial associate | 183385 | |
| Pensioner | 128392 | 16.5% |
| State servant | 65437 | 8.4% |
| Student | 337 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| working | 400164 | |
| commercial | 183385 | |
| associate | 183385 | |
| pensioner | 128392 | 12.5% |
| state | 65437 | 6.4% |
| servant | 65437 | 6.4% |
| student | 337 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 895326 | |
| o | 895326 | |
| r | 777378 | 9.2% |
| e | 754765 | 8.9% |
| n | 722722 | 8.5% |
| a | 681029 | 8.0% |
| s | 560599 | 6.6% |
| W | 400164 | 4.7% |
| k | 400164 | 4.7% |
| g | 400164 | 4.7% |
| Other values (11) | 1989779 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7450879 | |
| Uppercase Letter | 777715 | 9.2% |
| Space Separator | 248822 | 2.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 895326 | |
| o | 895326 | |
| r | 777378 | |
| e | 754765 | |
| n | 722722 | |
| a | 681029 | |
| s | 560599 | |
| k | 400164 | |
| g | 400164 | |
| t | 380370 | 5.1% |
| Other values (6) | 983036 |
Uppercase Letter
| Value | Count | Frequency (%) |
| W | 400164 | |
| C | 183385 | |
| P | 128392 | 16.5% |
| S | 65774 | 8.5% |
Space Separator
| Value | Count | Frequency (%) |
| 248822 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 8228594 | |
| Common | 248822 | 2.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 895326 | |
| o | 895326 | |
| r | 777378 | |
| e | 754765 | |
| n | 722722 | |
| a | 681029 | 8.3% |
| s | 560599 | 6.8% |
| W | 400164 | 4.9% |
| k | 400164 | 4.9% |
| g | 400164 | 4.9% |
| Other values (10) | 1740957 |
Common
| Value | Count | Frequency (%) |
| 248822 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8477416 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 895326 | |
| o | 895326 | |
| r | 777378 | 9.2% |
| e | 754765 | 8.9% |
| n | 722722 | 8.5% |
| a | 681029 | 8.0% |
| s | 560599 | 6.6% |
| W | 400164 | 4.7% |
| k | 400164 | 4.7% |
| g | 400164 | 4.7% |
| Other values (11) | 1989779 |
NAME_EDUCATION_TYPE
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| Secondary / secondary special | |
|---|---|
| Higher education | |
| Incomplete higher | 30329 |
| Lower secondary | 8655 |
| Academic degree | 837 |
Length
| Max length | 29 |
|---|---|
| Median length | 29 |
| Mean length | 24.790148 |
| Min length | 15 |
Characters and Unicode
| Total characters | 19279670 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Higher education |
|---|---|
| 2nd row | Higher education |
| 3rd row | Higher education |
| 4th row | Higher education |
| 5th row | Higher education |
Common Values
| Value | Count | Frequency (%) |
| Secondary / secondary special | 524261 | |
| Higher education | 213633 | |
| Incomplete higher | 30329 | 3.9% |
| Lower secondary | 8655 | 1.1% |
| Academic degree | 837 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| secondary | 1057177 | |
| 524261 | ||
| special | 524261 | |
| higher | 243962 | 9.4% |
| education | 213633 | 8.2% |
| incomplete | 30329 | 1.2% |
| lower | 8655 | 0.3% |
| academic | 837 | < 0.1% |
| degree | 837 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2111694 | |
| c | 1827074 | |
| 1826237 | ||
| a | 1795908 | |
| r | 1310631 | 6.8% |
| o | 1309794 | 6.8% |
| n | 1301139 | 6.7% |
| d | 1272484 | 6.6% |
| y | 1057177 | 5.5% |
| s | 1057177 | 5.5% |
| Other values (15) | 4410355 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 16151457 | |
| Space Separator | 1826237 | 9.5% |
| Uppercase Letter | 777715 | 4.0% |
| Other Punctuation | 524261 | 2.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 2111694 | |
| c | 1827074 | |
| a | 1795908 | |
| r | 1310631 | |
| o | 1309794 | |
| n | 1301139 | |
| d | 1272484 | |
| y | 1057177 | |
| s | 1057177 | |
| i | 982693 | |
| Other values (8) | 2125686 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 524261 | |
| H | 213633 | |
| I | 30329 | 3.9% |
| L | 8655 | 1.1% |
| A | 837 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 1826237 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 524261 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16929172 | |
| Common | 2350498 | 12.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 2111694 | |
| c | 1827074 | |
| a | 1795908 | |
| r | 1310631 | |
| o | 1309794 | |
| n | 1301139 | |
| d | 1272484 | |
| y | 1057177 | 6.2% |
| s | 1057177 | 6.2% |
| i | 982693 | 5.8% |
| Other values (13) | 2903401 |
Common
| Value | Count | Frequency (%) |
| 1826237 | ||
| / | 524261 | 22.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 19279670 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 2111694 | |
| c | 1827074 | |
| 1826237 | ||
| a | 1795908 | |
| r | 1310631 | 6.8% |
| o | 1309794 | 6.8% |
| n | 1301139 | 6.7% |
| d | 1272484 | 6.6% |
| y | 1057177 | 5.5% |
| s | 1057177 | 5.5% |
| Other values (15) | 4410355 |
NAME_FAMILY_STATUS
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| Married | |
|---|---|
| Single / not married | |
| Civil marriage | |
| Separated | 45255 |
| Widow | 31164 |
Length
| Max length | 20 |
|---|---|
| Median length | 7 |
| Mean length | 9.1562282 |
| Min length | 5 |
Characters and Unicode
| Total characters | 7120936 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Civil marriage |
|---|---|
| 2nd row | Civil marriage |
| 3rd row | Civil marriage |
| 4th row | Civil marriage |
| 5th row | Civil marriage |
Common Values
| Value | Count | Frequency (%) |
| Married | 546619 | |
| Single / not married | 94335 | 12.1% |
| Civil marriage | 60342 | 7.8% |
| Separated | 45255 | 5.8% |
| Widow | 31164 | 4.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| married | 640954 | |
| single | 94335 | 8.4% |
| 94335 | 8.4% | |
| not | 94335 | 8.4% |
| civil | 60342 | 5.4% |
| marriage | 60342 | 5.4% |
| separated | 45255 | 4.0% |
| widow | 31164 | 2.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 1447847 | |
| i | 947479 | |
| e | 886141 | |
| a | 852148 | |
| d | 717373 | |
| M | 546619 | 7.7% |
| 343347 | 4.8% | |
| n | 188670 | 2.6% |
| g | 154677 | 2.2% |
| l | 154677 | 2.2% |
| Other values (10) | 881958 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5905539 | |
| Uppercase Letter | 777715 | 10.9% |
| Space Separator | 343347 | 4.8% |
| Other Punctuation | 94335 | 1.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1447847 | |
| i | 947479 | |
| e | 886141 | |
| a | 852148 | |
| d | 717373 | |
| n | 188670 | 3.2% |
| g | 154677 | 2.6% |
| l | 154677 | 2.6% |
| m | 154677 | 2.6% |
| t | 139590 | 2.4% |
| Other values (4) | 262260 | 4.4% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 546619 | |
| S | 139590 | 17.9% |
| C | 60342 | 7.8% |
| W | 31164 | 4.0% |
Space Separator
| Value | Count | Frequency (%) |
| 343347 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 94335 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6683254 | |
| Common | 437682 | 6.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 1447847 | |
| i | 947479 | |
| e | 886141 | |
| a | 852148 | |
| d | 717373 | |
| M | 546619 | 8.2% |
| n | 188670 | 2.8% |
| g | 154677 | 2.3% |
| l | 154677 | 2.3% |
| m | 154677 | 2.3% |
| Other values (8) | 632946 |
Common
| Value | Count | Frequency (%) |
| 343347 | ||
| / | 94335 | 21.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7120936 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 1447847 | |
| i | 947479 | |
| e | 886141 | |
| a | 852148 | |
| d | 717373 | |
| M | 546619 | 7.7% |
| 343347 | 4.8% | |
| n | 188670 | 2.6% |
| g | 154677 | 2.2% |
| l | 154677 | 2.2% |
| Other values (10) | 881958 |
NAME_HOUSING_TYPE
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| House / apartment | |
|---|---|
| With parents | 35735 |
| Municipal apartment | 24640 |
| Rented apartment | 10898 |
| Office apartment | 5636 |
Length
| Max length | 19 |
|---|---|
| Median length | 17 |
| Mean length | 16.802963 |
| Min length | 12 |
Characters and Unicode
| Total characters | 13067916 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Rented apartment |
|---|---|
| 2nd row | Rented apartment |
| 3rd row | Rented apartment |
| 4th row | Rented apartment |
| 5th row | Rented apartment |
Common Values
| Value | Count | Frequency (%) |
| House / apartment | 697151 | |
| With parents | 35735 | 4.6% |
| Municipal apartment | 24640 | 3.2% |
| Rented apartment | 10898 | 1.4% |
| Office apartment | 5636 | 0.7% |
| Co-op apartment | 3655 | 0.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| apartment | 741980 | |
| house | 697151 | |
| 697151 | ||
| with | 35735 | 1.6% |
| parents | 35735 | 1.6% |
| municipal | 24640 | 1.1% |
| rented | 10898 | 0.5% |
| office | 5636 | 0.3% |
| co-op | 3655 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 1566328 | |
| a | 1544335 | |
| e | 1502298 | |
| 1474866 | ||
| n | 813253 | 6.2% |
| p | 806010 | 6.2% |
| r | 777715 | 6.0% |
| m | 741980 | 5.7% |
| s | 732886 | 5.6% |
| u | 721791 | 5.5% |
| Other values (15) | 2386454 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 10114529 | |
| Space Separator | 1474866 | 11.3% |
| Uppercase Letter | 777715 | 6.0% |
| Other Punctuation | 697151 | 5.3% |
| Dash Punctuation | 3655 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 1566328 | |
| a | 1544335 | |
| e | 1502298 | |
| n | 813253 | |
| p | 806010 | |
| r | 777715 | |
| m | 741980 | |
| s | 732886 | |
| u | 721791 | |
| o | 704461 | |
| Other values (6) | 203472 | 2.0% |
Uppercase Letter
| Value | Count | Frequency (%) |
| H | 697151 | |
| W | 35735 | 4.6% |
| M | 24640 | 3.2% |
| R | 10898 | 1.4% |
| O | 5636 | 0.7% |
| C | 3655 | 0.5% |
Space Separator
| Value | Count | Frequency (%) |
| 1474866 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 697151 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 3655 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 10892244 | |
| Common | 2175672 | 16.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 1566328 | |
| a | 1544335 | |
| e | 1502298 | |
| n | 813253 | |
| p | 806010 | |
| r | 777715 | |
| m | 741980 | |
| s | 732886 | |
| u | 721791 | |
| o | 704461 | |
| Other values (12) | 981187 |
Common
| Value | Count | Frequency (%) |
| 1474866 | ||
| / | 697151 | |
| - | 3655 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13067916 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 1566328 | |
| a | 1544335 | |
| e | 1502298 | |
| 1474866 | ||
| n | 813253 | 6.2% |
| p | 806010 | 6.2% |
| r | 777715 | 6.0% |
| m | 741980 | 5.7% |
| s | 732886 | 5.6% |
| u | 721791 | 5.5% |
| Other values (15) | 2386454 |
DAYS_BIRTH
Real number (ℝ)
| Distinct | 7183 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -16124.937 |
| Minimum | -25152 |
|---|---|
| Maximum | -7489 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 777715 |
| Negative (%) | 100.0% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | -25152 |
|---|---|
| 5-th percentile | -23015 |
| Q1 | -19453 |
| median | -15760 |
| Q3 | -12716 |
| 95-th percentile | -10048 |
| Maximum | -7489 |
| Range | 17663 |
| Interquartile range (IQR) | 6737 |
Descriptive statistics
| Standard deviation | 4104.304 |
|---|---|
| Coefficient of variation (CV) | -0.25453148 |
| Kurtosis | -1.0227322 |
| Mean | -16124.937 |
| Median Absolute Deviation (MAD) | 3327 |
| Skewness | -0.17693133 |
| Sum | -1.2540605 × 1010 |
| Variance | 16845311 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -14667 | 1018 | 0.1% |
| -15140 | 928 | 0.1% |
| -15675 | 835 | 0.1% |
| -15519 | 799 | 0.1% |
| -16995 | 799 | 0.1% |
| -13788 | 796 | 0.1% |
| -12483 | 793 | 0.1% |
| -14636 | 787 | 0.1% |
| -13300 | 784 | 0.1% |
| -12569 | 720 | 0.1% |
| Other values (7173) | 769456 |
| Value | Count | Frequency (%) |
| -25152 | 84 | |
| -25140 | 112 | |
| -25099 | 32 | < 0.1% |
| -25088 | 32 | < 0.1% |
| -25010 | 21 | < 0.1% |
| -24970 | 47 | < 0.1% |
| -24963 | 17 | < 0.1% |
| -24946 | 97 | |
| -24932 | 146 | |
| -24914 | 129 |
| Value | Count | Frequency (%) |
| -7489 | 1 | < 0.1% |
| -7705 | 5 | < 0.1% |
| -7723 | 2 | < 0.1% |
| -7757 | 52 | |
| -7959 | 13 | < 0.1% |
| -7980 | 17 | < 0.1% |
| -8041 | 78 | |
| -8054 | 4 | < 0.1% |
| -8056 | 25 | < 0.1% |
| -8067 | 13 | < 0.1% |
DAYS_EMPLOYED
Real number (ℝ)
| Distinct | 3640 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 57775.825 |
| Minimum | -15713 |
|---|---|
| Maximum | 365243 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 649743 |
| Negative (%) | 83.5% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | -15713 |
|---|---|
| 5-th percentile | -7369 |
| Q1 | -3292 |
| median | -1682 |
| Q3 | -431 |
| 95-th percentile | 365243 |
| Maximum | 365243 |
| Range | 380956 |
| Interquartile range (IQR) | 2861 |
Descriptive statistics
| Standard deviation | 136471.74 |
|---|---|
| Coefficient of variation (CV) | 2.3620906 |
| Kurtosis | 1.2722827 |
| Mean | 57775.825 |
| Median Absolute Deviation (MAD) | 1379 |
| Skewness | 1.808405 |
| Sum | 4.4933126 × 1010 |
| Variance | 1.8624535 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 365243 | 127972 | 16.5% |
| -1751 | 1601 | 0.2% |
| -1539 | 1545 | 0.2% |
| -401 | 1498 | 0.2% |
| -2531 | 1319 | 0.2% |
| -108 | 1319 | 0.2% |
| -200 | 1221 | 0.2% |
| -1812 | 1219 | 0.2% |
| -1678 | 1179 | 0.2% |
| -2087 | 1176 | 0.2% |
| Other values (3630) | 637666 |
| Value | Count | Frequency (%) |
| -15713 | 17 | < 0.1% |
| -15661 | 116 | < 0.1% |
| -15227 | 18 | < 0.1% |
| -15072 | 33 | < 0.1% |
| -15038 | 544 | |
| -14887 | 186 | < 0.1% |
| -14810 | 296 | |
| -14775 | 58 | < 0.1% |
| -14536 | 160 | < 0.1% |
| -14473 | 204 | < 0.1% |
| Value | Count | Frequency (%) |
| 365243 | 127972 | |
| -17 | 33 | < 0.1% |
| -43 | 29 | < 0.1% |
| -65 | 45 | < 0.1% |
| -66 | 21 | < 0.1% |
| -70 | 45 | < 0.1% |
| -71 | 18 | < 0.1% |
| -73 | 374 | < 0.1% |
| -78 | 8 | < 0.1% |
| -79 | 27 | < 0.1% |
FLAG_MOBIL
Categorical
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| 1 |
|---|
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 1 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 777715 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 777715 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 777715 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 777715 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 777715 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 777715 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 777715 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 777715 |
FLAG_WORK_PHONE
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 777715 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 777715 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 597427 | |
| 1 | 180288 | 23.2% |
FLAG_PHONE
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 777715 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 777715 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 543650 | |
| 1 | 234065 |
FLAG_EMAIL
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 777715 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 777715 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 706418 | |
| 1 | 71297 | 9.2% |
| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 240048 |
| Missing (%) | 30.9% |
| Memory size | 11.9 MiB |
| Laborers | |
|---|---|
| Core staff | |
| Sales staff | |
| Managers | |
| Drivers | |
| Other values (13) |
Length
| Max length | 21 |
|---|---|
| Median length | 20 |
| Mean length | 10.515029 |
| Min length | 7 |
Characters and Unicode
| Total characters | 5653584 |
|---|---|
| Distinct characters | 36 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Security staff |
|---|---|
| 2nd row | Security staff |
| 3rd row | Security staff |
| 4th row | Security staff |
| 5th row | Security staff |
Common Values
| Value | Count | Frequency (%) |
| Laborers | 131572 | |
| Core staff | 77112 | 9.9% |
| Sales staff | 70362 | 9.0% |
| Managers | 67738 | 8.7% |
| Drivers | 47678 | 6.1% |
| High skill tech staff | 31768 | 4.1% |
| Accountants | 27223 | 3.5% |
| Medicine staff | 26691 | 3.4% |
| Cooking staff | 13416 | 1.7% |
| Security staff | 12400 | 1.6% |
| Other values (8) | 31707 | 4.1% |
| (Missing) | 240048 |
Length
| Value | Count | Frequency (%) |
| staff | 255424 | |
| laborers | 135195 | |
| core | 77112 | 8.9% |
| sales | 70362 | 8.1% |
| managers | 67738 | 7.8% |
| drivers | 47678 | 5.5% |
| high | 31768 | 3.7% |
| skill | 31768 | 3.7% |
| tech | 31768 | 3.7% |
| accountants | 27223 | 3.1% |
| Other values (13) | 92188 | 10.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| s | 652691 | |
| a | 652576 | |
| r | 547836 | |
| e | 544257 | 9.6% |
| f | 510848 | 9.0% |
| t | 368978 | 6.5% |
| 330557 | 5.8% | |
| o | 269985 | 4.8% |
| i | 224568 | 4.0% |
| n | 188906 | 3.3% |
| Other values (26) | 1362382 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4772552 | |
| Uppercase Letter | 544295 | 9.6% |
| Space Separator | 330557 | 5.8% |
| Dash Punctuation | 3623 | 0.1% |
| Other Punctuation | 2557 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| s | 652691 | |
| a | 652576 | |
| r | 547836 | |
| e | 544257 | |
| f | 510848 | |
| t | 368978 | |
| o | 269985 | |
| i | 224568 | 4.7% |
| n | 188906 | 4.0% |
| l | 153803 | 3.2% |
| Other values (11) | 658104 |
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 138818 | |
| C | 101927 | |
| M | 94429 | |
| S | 85911 | |
| D | 47678 | 8.8% |
| H | 33454 | 6.1% |
| A | 27223 | 5.0% |
| P | 6714 | 1.2% |
| R | 2946 | 0.5% |
| W | 2557 | 0.5% |
| Other values (2) | 2638 | 0.5% |
Space Separator
| Value | Count | Frequency (%) |
| 330557 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 3623 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 2557 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5316847 | |
| Common | 336737 | 6.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| s | 652691 | |
| a | 652576 | |
| r | 547836 | |
| e | 544257 | |
| f | 510848 | |
| t | 368978 | 6.9% |
| o | 269985 | 5.1% |
| i | 224568 | 4.2% |
| n | 188906 | 3.6% |
| l | 153803 | 2.9% |
| Other values (23) | 1202399 |
Common
| Value | Count | Frequency (%) |
| 330557 | ||
| - | 3623 | 1.1% |
| / | 2557 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5653584 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| s | 652691 | |
| a | 652576 | |
| r | 547836 | |
| e | 544257 | 9.6% |
| f | 510848 | 9.0% |
| t | 368978 | 6.5% |
| 330557 | 5.8% | |
| o | 269985 | 4.8% |
| i | 224568 | 4.0% |
| n | 188906 | 3.3% |
| Other values (26) | 1362382 |
CNT_FAM_MEMBERS
Real number (ℝ)
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2088374 |
| Minimum | 1 |
|---|---|
| Maximum | 20 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 20 |
| Range | 19 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.90737972 |
|---|---|
| Coefficient of variation (CV) | 0.41079516 |
| Kurtosis | 7.7223182 |
| Mean | 2.2088374 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.3241752 |
| Sum | 1717846 |
| Variance | 0.82333796 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 423723 | |
| 1 | 141477 | 18.2% |
| 3 | 134894 | 17.3% |
| 4 | 66990 | 8.6% |
| 5 | 8999 | 1.2% |
| 6 | 1196 | 0.2% |
| 7 | 273 | < 0.1% |
| 15 | 111 | < 0.1% |
| 9 | 46 | < 0.1% |
| 20 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 141477 | 18.2% |
| 2 | 423723 | |
| 3 | 134894 | 17.3% |
| 4 | 66990 | 8.6% |
| 5 | 8999 | 1.2% |
| 6 | 1196 | 0.2% |
| 7 | 273 | < 0.1% |
| 9 | 46 | < 0.1% |
| 15 | 111 | < 0.1% |
| 20 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 20 | 6 | < 0.1% |
| 15 | 111 | < 0.1% |
| 9 | 46 | < 0.1% |
| 7 | 273 | < 0.1% |
| 6 | 1196 | 0.2% |
| 5 | 8999 | 1.2% |
| 4 | 66990 | 8.6% |
| 3 | 134894 | 17.3% |
| 2 | 423723 | |
| 1 | 141477 | 18.2% |
MONTHS_BALANCE
Real number (ℝ)
| Distinct | 61 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -19.373564 |
| Minimum | -60 |
|---|---|
| Maximum | 0 |
| Zeros | 24672 |
| Zeros (%) | 3.2% |
| Negative | 753043 |
| Negative (%) | 96.8% |
| Memory size | 11.9 MiB |
Quantile statistics
| Minimum | -60 |
|---|---|
| 5-th percentile | -46 |
| Q1 | -29 |
| median | -17 |
| Q3 | -8 |
| 95-th percentile | -1 |
| Maximum | 0 |
| Range | 60 |
| Interquartile range (IQR) | 21 |
Descriptive statistics
| Standard deviation | 14.082208 |
|---|---|
| Coefficient of variation (CV) | -0.72687753 |
| Kurtosis | -0.51805712 |
| Mean | -19.373564 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | -0.59867375 |
| Sum | -15067111 |
| Variance | 198.30858 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -1 | 24963 | 3.2% |
| -2 | 24871 | 3.2% |
| 0 | 24672 | 3.2% |
| -3 | 24644 | 3.2% |
| -4 | 24274 | 3.1% |
| -5 | 23899 | 3.1% |
| -6 | 23473 | 3.0% |
| -7 | 23018 | 3.0% |
| -8 | 22494 | 2.9% |
| -9 | 22090 | 2.8% |
| Other values (51) | 539317 |
| Value | Count | Frequency (%) |
| -60 | 321 | < 0.1% |
| -59 | 627 | 0.1% |
| -58 | 955 | 0.1% |
| -57 | 1253 | 0.2% |
| -56 | 1588 | |
| -55 | 1939 | |
| -54 | 2279 | |
| -53 | 2633 | |
| -52 | 3070 | |
| -51 | 3514 |
| Value | Count | Frequency (%) |
| 0 | 24672 | |
| -1 | 24963 | |
| -2 | 24871 | |
| -3 | 24644 | |
| -4 | 24274 | |
| -5 | 23899 | |
| -6 | 23473 | |
| -7 | 23018 | |
| -8 | 22494 | |
| -9 | 22090 |
STATUS
Categorical
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.9 MiB |
| C | |
|---|---|
| 0 | |
| X | |
| 1 | 8747 |
| 5 | 1527 |
| Other values (3) | 1301 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 777715 |
|---|---|
| Distinct characters | 8 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | C |
|---|---|
| 2nd row | C |
| 3rd row | C |
| 4th row | C |
| 5th row | C |
Common Values
| Value | Count | Frequency (%) |
| C | 329536 | |
| 0 | 290654 | |
| X | 145950 | |
| 1 | 8747 | 1.1% |
| 5 | 1527 | 0.2% |
| 2 | 801 | 0.1% |
| 3 | 286 | < 0.1% |
| 4 | 214 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| c | 329536 | |
| 0 | 290654 | |
| x | 145950 | |
| 1 | 8747 | 1.1% |
| 5 | 1527 | 0.2% |
| 2 | 801 | 0.1% |
| 3 | 286 | < 0.1% |
| 4 | 214 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 329536 | |
| 0 | 290654 | |
| X | 145950 | |
| 1 | 8747 | 1.1% |
| 5 | 1527 | 0.2% |
| 2 | 801 | 0.1% |
| 3 | 286 | < 0.1% |
| 4 | 214 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 475486 | |
| Decimal Number | 302229 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 290654 | |
| 1 | 8747 | 2.9% |
| 5 | 1527 | 0.5% |
| 2 | 801 | 0.3% |
| 3 | 286 | 0.1% |
| 4 | 214 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 329536 | |
| X | 145950 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 475486 | |
| Common | 302229 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 290654 | |
| 1 | 8747 | 2.9% |
| 5 | 1527 | 0.5% |
| 2 | 801 | 0.3% |
| 3 | 286 | 0.1% |
| 4 | 214 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| C | 329536 | |
| X | 145950 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 777715 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 329536 | |
| 0 | 290654 | |
| X | 145950 | |
| 1 | 8747 | 1.1% |
| 5 | 1527 | 0.2% |
| 2 | 801 | 0.1% |
| 3 | 286 | < 0.1% |
| 4 | 214 | < 0.1% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| ID | CODE_GENDER | FLAG_OWN_CAR | FLAG_OWN_REALTY | CNT_CHILDREN | AMT_INCOME_TOTAL | NAME_INCOME_TYPE | NAME_EDUCATION_TYPE | NAME_FAMILY_STATUS | NAME_HOUSING_TYPE | DAYS_BIRTH | DAYS_EMPLOYED | FLAG_MOBIL | FLAG_WORK_PHONE | FLAG_PHONE | FLAG_EMAIL | OCCUPATION_TYPE | CNT_FAM_MEMBERS | MONTHS_BALANCE | STATUS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | 0 | C |
| 1 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -1 | C |
| 2 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -2 | C |
| 3 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -3 | C |
| 4 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -4 | C |
| 5 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -5 | C |
| 6 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -6 | C |
| 7 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -7 | C |
| 8 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -8 | C |
| 9 | 5008804 | M | Y | Y | 0 | 427500.0 | Working | Higher education | Civil marriage | Rented apartment | -12005 | -4542 | 1 | 1 | 0 | 0 | NaN | 2 | -9 | C |
| ID | CODE_GENDER | FLAG_OWN_CAR | FLAG_OWN_REALTY | CNT_CHILDREN | AMT_INCOME_TOTAL | NAME_INCOME_TYPE | NAME_EDUCATION_TYPE | NAME_FAMILY_STATUS | NAME_HOUSING_TYPE | DAYS_BIRTH | DAYS_EMPLOYED | FLAG_MOBIL | FLAG_WORK_PHONE | FLAG_PHONE | FLAG_EMAIL | OCCUPATION_TYPE | CNT_FAM_MEMBERS | MONTHS_BALANCE | STATUS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 777705 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -4 | 0 |
| 777706 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -5 | 0 |
| 777707 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -6 | 0 |
| 777708 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -7 | 0 |
| 777709 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -8 | 0 |
| 777710 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -9 | 0 |
| 777711 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -10 | 2 |
| 777712 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -11 | 1 |
| 777713 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -12 | 0 |
| 777714 | 5150337 | M | N | Y | 0 | 112500.0 | Working | Secondary / secondary special | Single / not married | Rented apartment | -9188 | -1193 | 1 | 0 | 0 | 0 | Laborers | 1 | -13 | 0 |